Introduction to “Introduction to Data Science”

Erik Fredner

2024-08-26

Outline

  1. Getting to know each other
  2. Course overview
  3. Syllabus
  4. Install stuff
  5. Next time

Getting to know each other

Introductions

  1. Preferred name
  2. Pronouns
  3. Something you dislike

Example:

  1. Professor Fredner
  2. He/him
  3. Gushers

Introduce me to UR

  • This is my first semester at UR!
  • How would you describe the culture of UR?
    • Discuss with the people next to you.

Anonymous survey

Please complete this brief anonymous survey: https://richmond.ca1.qualtrics.com/jfe/form/SV_8vsOr6mRZGypJVc

This link was included with the announcement you received on Sunday.

I will use the survey results to design the class.

Course Overview

What do we mean by “data science?”

  • We will learn and practice a series of methods for organizing, collecting, visualizing, manipulating, and exploring different kinds of data.
  • We focus on the creation of data and application of methods, not theoretical or foundational questions.
  • This is not a mathematics course, nor will it resemble a traditional introductory statistics class. We will spend the entire semester writing code to apply data science concepts.

Why do data science? One perspective:

  • There is too much information in the world.
    • e.g., every minute, approximately 500 hours of video are uploaded to YouTube.
  • People value useful information and new knowledge.
    • Almost none of those 500 hours are worth your finite time.
  • Data science transforms data into useful information.

How do people do data science?

  • R and Python are the two most popular programming languages for data science.
  • We will be using R in this class.
  • However, the main learning goal of this class is not R.
  • The main learning goal is to understand:
    • what good data is
    • how to ask good questions of data
    • how we can use good data to answer good questions

What does the data science process look like?

flowchart LR
    A[Import] --> B[Tidy]
    B --> C[Transform]
    subgraph Understand
        direction LR
        C --> D[Visualize]
        D --> E[Model]
        E --> C
    end
    Understand --> F[Communicate]
    subgraph Program
        direction LR
        A
        B
        Understand
        F
    end

from R for Data Science (2023)

An effective example

Syllabus

Questions about syllabus?

The elephant in the room

  • No one knows the best course of action for students or professors regarding GenAI (e.g., ChatGPT).
  • As a result, you will probably get different instructions from all of your professors about it.
  • Use of GenAI is not required in this course.
  • But it is already widespread in all domains of programming, including data science, so it is not prohibited either.

Custom GPT

  • I created a Custom GPT for this course that already knows about the course requirements.
  • I ask you to do these things in this order when you run into trouble on homework:
  1. Read the notes.
  2. Talk to your classmates.
  3. Search for credible information online.
  4. Ask the Custom GPT for help.
  5. Cite your interaction with the Custom GPT.

Install stuff

Step 1: R

Install the version for your operating system: https://cloud.r-project.org

If you already have R installed, install the newest version.

If you have a Mac that was made in 2020 or later, choose “Apple Silicon.”

Step 2: RStudio

Install the free version of RStudio Desktop: https://posit.co/downloads/

If you already have RStudio installed, install the newest version.

Step 3: Test it

  1. Open RStudio.
  2. Type the following in the R console (bottom left quadrant with the >):
print("Hello, world!")

Note that you must get the parentheses () and quotes "" exactly right.

  1. Press Enter or Return on your keyboard.

You should see this in the console:

[1] "Hello, world!"

Step 4: Install the tidyverse

Type the following in the console:

install.packages("tidyverse")

A bunch of things should happen.

You only need to let me know if you get an Error.

Step 4.5: Debugging

  • If you ran into a problem during the installation, raise your hand.
  • If you finished installing everything successfully and think you can help others, go to the closest person with their hand up.

Step 5: Download materials

  • Go to Blackboard / Course Documents / 00 - Intro
  • Download DSST289.zip
  • Extract this .zip file
  • Move it to a location on your computer that you can find easily (e.g., Desktop)

With our remaining time

  1. Read the notes for next class: Blackboard > Course Documents > 01 - Tabular Data > notes
  2. Complete the questions at the end of the notes.
  3. Yes, you may work with your classmates on homework! I recommend that you exchange contact information.
  • If someone wants to take the lead on organizing a full-class study group on WhatsApp or whatever, go for it.
  1. Upload your completed questions to the Blackboard assignment “homework upload” before next class.